-
Notifications
You must be signed in to change notification settings - Fork 11
fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(worker): Cap bundle analysis processor at 10 attempts and fix retry logic #688
Conversation
…ry logic Cap total attempts at 10 (not 10 retries + 1) for BundleAnalysisProcessorTask and LockManager so we stop after 10 tries. Add Redis-backed attempt counter in LockManager for lock contention so broker re-deliveries with unchanged headers do not retry indefinitely. BaseCodecovTask._has_exceeded_max_attempts now takes max_attempts and compares to attempts (retries + 1 or header). On generic exception in bundle processor, return and set upload to error instead of re-raising to avoid unbounded retries. Update tests: mock request for _has_exceeded_max_attempts, set mock_redis.incr.return_value where LockManager compares attempts, and adjust cleanup test to expect return instead of raised ValueError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>
- LockManager: extract _clear_lock_attempt_counter to remove nested try in locked() - Upload finisher: log max_attempts as UPLOAD_PROCESSOR_MAX_RETRIES (not +1) - Lock_manager: comment TTL intent instead of restating 24h - Tests: remove hard-coded (10) from comments; use max_attempts wording Co-authored-by: Cursor <cursoragent@cursor.com>
- BaseCodecovTask: doc and safe_retry use max_retries; drop max_attempts property - LockRetry: max_attempts -> max_retries (same semantics: max total attempts) - LockManager/bundle_analysis_processor/upload_finisher: log and Sentry use max_retries - Tests: LockRetry(max_retries=...), comments say max_retries - celery_config: one-line convention (max_retries = max total attempts) - Fix duplicate dict keys in lock_manager and upload_finisher Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>
…ssor-at-10-attempts-and-fix Resolve lock_manager conflict: keep attempt counter and max_retries logic, use self.base_retry_countdown (from main) for countdown calculation. Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #688 +/- ##
==========================================
- Coverage 92.37% 92.36% -0.01%
==========================================
Files 1301 1301
Lines 47782 47793 +11
Branches 1613 1613
==========================================
+ Hits 44139 44145 +6
- Misses 3334 3339 +5
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes |
…test LockManager uses redis incr for attempt count; mock must return an int so attempts >= max_retries does not raise TypeError. Refs CCMRG-2042 Co-authored-by: Cursor <cursoragent@cursor.com>
When LockManager's Redis attempt counter hits max_retries before the task's attempt count (e.g. re-deliveries), it raises LockRetry(max_retries_exceeded=True, countdown=0). ManualTriggerTask only checked self._has_exceeded_max_attempts(), so it fell through to self.retry(countdown=0) and caused rapid zero-delay retries. Align with preprocess_upload and other callers: check retry.max_retries_exceeded or self._has_exceeded_max_attempts() and return failure dict when either is true. Add test for the Redis-counter path. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| _set_upload_error_and_commit( | ||
| db_session, upload, commitid, repoid, log_suffix=" fallback" | ||
| ) | ||
| return previous_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent return bypasses defensive isinstance check
Low Severity
The retryable-error max-retries-exceeded path at line 258 returns previous_result directly, while the general exception path at line 295 returns processing_results. Since processing_results is defined as previous_result if isinstance(previous_result, list) else [], these diverge when previous_result is unexpectedly not a list — the general exception path safely returns [], but the retryable error path returns the raw non-list value, bypassing the defensive isinstance guard. The old code consistently used processing_results in both paths.


Cap the bundle analysis processor at 10 total attempts (not 10 retries + 1) and fix infinite retries when the broker re-delivers with unchanged headers.
What:
BundleAnalysisProcessorTaskandLockManagernow treat the configured value as max total attempts (e.g. 10 = 10 attempts then stop).BaseCodecovTask._has_exceeded_max_attempts(max_attempts)comparesattempts(from retries + 1 orheaders["attempts"]) tomax_attempts. LockManager uses a Redis-backed attempt counter for lock contention so re-deliveries without new headers don't retry forever. On generic exception in the processor we return and set the upload to error instead of re-raising, so the task is acknowledged and we avoid unbounded retries.Why: Prevents runaway retries and aligns behavior with a clear "10 attempts then stop" cap; fixes cases where the broker re-delivers with the same headers so Celery's retry count doesn't increase.
Tests: Updated unit and integration tests: mock request for
_has_exceeded_max_attempts, setmock_redis.incr.return_valuewhere LockManager compares attempts, and adjusted the cleanup-with-None-result test to expect return + upload error instead of a raisedValueError.Refs CCMRG-2042
Note
Medium Risk
Changes core retry/locking behavior and alters bundle analysis processor failure semantics (return vs raise), which could affect task completion and error propagation. Risk is mitigated by extensive unit/integration test updates and mostly scoped to worker retry paths.
Overview
Fixes runaway retries by redefining
max_retriesacross worker tasks as maximum total attempts (stop whenattempts >= max_retries) and updatingBaseCodecovTask._has_exceeded_max_attempts/safe_retryand config comments accordingly.LockManagernow tracks lock-acquisition attempts via a Redisincrcounter keyed per lock (with TTL) so tasks can stop retrying even when broker re-deliveries don’t bump Celery retry counts; it also clears this counter on successful lock usage and updatesLockRetry/logs/Sentry context to reportmax_retries/attempts.BundleAnalysisProcessorTask(and several other lock-using tasks likemanual_trigger,preprocess_upload,upload_finisher, notifier/finisher tasks) now honor the lock manager’smax_retries_exceededsignal in addition to task attempt caps; the bundle analysis processor additionally refactors repeated logging/DB error-state commits and, on generic processing exceptions, sets the upload to error and returns instead of re-raising to avoid unbounded retries and keep chains progressing withprevious_result.Tests are updated to reflect Redis attempt counting (
mock_redis.incr), the new attempt boundary semantics, and the changed processor behavior (returningprevious_result/not raising in some failure cases).Written by Cursor Bugbot for commit 5a7c73f. This will update automatically on new commits. Configure here.